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1 Information retrieval session 8: efficiency: Efficient query evaluation usin g a two-level 
<^ retrieval process 

Andrei Z. Broder, David Carnnel, Michael Herscovici, Aya Soffer, Jason Zien 

November 2003 Proceedings of the twelfth international conference on Information 

and knowledge management 
Publisher: ACM Press 

Full text available: '^ pdf(248.95 KB) Additional Information: full citation , abstract , references , index terms 

We present an efficient query evaluation method based on a two level approach: at the 
first level, our method iterates In parallel over query term postings and identifies 
candidate documents using an approximate evaluation taking into account only partial 
information on term occurrences and no query independent factors; at the second level, 
promising candidates are fully evaluated and their exact scores are computed. The 
efficiency of the evaluation process can be improved signific ... 



Keywords: WAND, document-at-a-time, efficient query evaluation 



2 A novel method for the evaluation of Boolean query effectiveness across a wide | 

^ operational range 
^ Eero Sormunen 

July 2000 Proceedings of the 23rd annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available- "f?! Ddf(796 61 KB) Additional Information: full citation , abstract , references , citings, index 
^ terms 

Traditional methods for the system-oriented evaluation of Boolean IR system suffer from 
validity and reliability problems. Laboratory-based research neglects the searcher and 
studies suboptimal queries. Research on operational systems fails to make a distinction 
between searcher performance and system performance. This approach is neither capable 
of measuring performance at standard points of operation (e.g. across RO.O-Rl.O). 

A new laboratory-based evaluation method for Boolean IR sy ... 

Keywords: evaluation (general), structured queries, test collections, testing methodology 
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3 Using local optimality criteria for efficient information retrieval with redundant 
information filters 
Neil C. Rowe 

April 1996 ACM Transactions on Information Systems (TOIS), volume i4 issue 2 
Publisher: ACM Press 

Full text available: '^pdf (2.21 MB ) Additional Information: full citation , abstract , references , index terms 

We consider information retrieval when the data— for instance, multimedia— is 
computationally expensive to fetch. Our approach uses "information filters" to 
considerably narrow the universe of possibilities before retrieval. We are especially 
interested in redundant information filters that save time over more general but more 
costly filters. Efficient retrieval requires that decisions must be made about the necessity, 
order, and concurrent processing of proposed filte ... 

Keywords: Boolean algebra, conjunction, filters, natural language, optimization, queries 



Interaction of query evaluation and buffer mana g ement for information retrieval 
Bjorn T. Jonsson, Michael J. Franklin, Divesh Srivastava 

June 1998 ACM SIGMOD Record , Proceedings of the 1998 ACi^ SIGMOD international 

conference on Management of data SIGMOD '98, volume n issue 2 
Publisher: ACIVI Press 

Full text available* " fSodtd 81 IVIB) Additional Information: full citation , abstract , references , citings . Index 

terms 

The proliferation of t:he World Wide Web has brought information retrieval (IR) techniques 
to the forefront of search technology. To the average computer user, "searching" now 
means using IR-based systems for finding information on the WWW or in other document 
collections. IR query evaluation methods and workloads differ significantly from those 
found in database systems. In this paper, we focus on three such differences. First, due to 
the inherent fuzzlness of the natural langua ... 

XIRQL: a query language for information retrieval in XML documents | 

Norbert Fuhr, Kai GroBjohann 

September 2001 Proceedings of the 24th annual international ACM SIGIR conference 
on Research and development in information retrieval 

Publisher: ACM Press 

Full text available' fi^ pdf(234 90 KB) A^^'^'^^^' Information: full citation , abstract , references , citings . Index 
^ terms 

Based on the document-centric view of XML, we present the query language XIRQL. 
Current proposals for XML query languages lack most IR-related features, which are 
weighting and ranking, relevance-oriented search, datatypes with vague predicates, and 
semantic relativism. XIRQL integrates these features by using Ideas from logic-based 
probabilistic IR models, in combination with concepts from the database area. For 
processing XIRQL queries, a path algebra is presented, that also serves as ... 

Fast evaluation of structured queries for information retrieval | 
Eric W, Brown 

July 1995 Proceedings of the 18th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Publisher: ACM Press 

Full text available: ^pdf (1.15 MB) Additional Information: full citation , references , citin gs, index terms 
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7 A probabilistic relational algebra for the integration of information retrieval and 
^ database systems 
^ Norbert Fuhr, Thomas Rolleke 

January 1997. ACM Transactions on Information Systems (TOIS), volume is issue i 

Publisher: ACM Press 

Full text available* pdf (2 10 MB) Additional Information: full citation , abstract , references , citings , index 
• l^a • terms , review 

We present a probabilistic relational algebra (PRA) which is a generalization of standard 
relational algebra. In PRA, tuples are assigned probabilistic weights giving the probability 
that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of 
the result of a PRA expression always conform to the underlying probabilistic model. We 
also show for which expressions extensional semantics yields the same results. 
Furthermore, we discuss complexity issues and indicate p ... 

Keywords: hypertext retrieval, imprecise data, logical retrieval model, probabilistic 
retrieval, relational data model, uncertain data, vague predicates 



8 The effect multiple query representations on information retrieval system 
^ performance . 

^ Nicholas J. Belkin, C. Cool, W. Bruce Croft, James P. Callan 

July 1993 Proceedings of the 16th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: ACM Press 

Full text available: ^ pdf(883.37 KB) Additional Information: full citation , references , citings , index terms 



9 Join queries with external text sources: execution and optimization techniques 
^ Surajit Chaudhuri, Umeshwar Dayal, Tak W. Van 

May 1995 ACM SIGMOD Record , Proceedings of the 1995 ACM SIGMOD international 
conference on Management of data SIGMOD '95, volume 24 issue 2 

Publisher: ACM Press 

Full text available* ^ odftl 49 MB) Additional Information: full citation , abstract , references , citings , index 

terms 

Text is a pervasive information type, and many applications require querying over text 
sources in addition to structured data. This paper studies the problem of query processing 
in a system that loosely integrates an extensible database system and a text retrieval 
system. We focus on a class of conjunctive queries that include joins between text and 
structured data, in addition to selections over these two types of data. We adapt 
techniques from distributed query processing and introduce a novel ... 

^0 Query Optimization in Database Systems 

Matthias Jarke, Jurgen Koch 

June 1984 ACM Computing Surveys (CSUR), volume 16 issue 2 
Publisher: ACM Press 

Full text available: pdf(2.84 MB) Additional Information: full citation , references , citing s, index terms 



1^ Query optimization for selections using bitma ps 
Ming-Chuan Wu 

June 1999 ACM SIGMOD Record , Proceedings of the 1999 ACM SIGMOD international 
conference on Management of data SIGMOD '99, volume 28 issue 2 
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Publisher: ACM Press 

Full text available* IS pdfd 54 MB) Additional Information: full citation, abstract, references , citings , index 
* '^^^•^ terms . 

Bitmaps are popular Indexes for data warehouse (DW) applications and most database 
management systems offer them today. This paper proposes query optimization 
strategies for selections using bitmaps. Both continuous and discrete selection criteria are 
considered. Query optimization strategies are categorized into static and dynamic. Static 
optimization strategies discussed are the optimal design of bitmaps, and algorithms based 
on tree and logical reduction ... 

12 XIRQL: An XML q uery languag e based on information retrieval concep ts 

Norbert Fuhr, Kai GroPjohann 

April 2004 ACM Transactions on Information Systems (TOIS), volume 22 issue 2 
Publisher: ACM Press 

Full text available* "S pclf(281 91 KB) Additional Information: full citation , abstract , references , citings , index 
'\M = terms 

XIRQL ("circle") is an XML query language that incorporates imprecision and vagueness 
for both structural and content-oriented query conditions. The corresponding uncertainty 
is handled by a consistent probabilistic model. The core features of XIRQL are (1) 
document ranking based on index term weighting, (2) specificity-oriented search for 
retrieving the most relevant parts of documents, (3) datatypes with vague predicates for 
dealing with specific types of content and (4) structural vagueness f ... 

Keywords: Path algebra, XML, XQuery, probabilistic retrieval, ranked retrieval, vague 
predicates 



13 SIGIR 2 - Information retrieval systems: Design considerations for a Boolean search | 
^ s ystem with au tom atic relevance feedback processing 
^ Jon T. Rickman 

August 1972 Proceedings of the ACM annual conference - Volume 1 

Publisher: ACM Press 

Full text available: pdf(290.56 KB ) Additional Information: full citation , abst ract , references 

Two major problems are considered in the design of a Boolean search system with 
automatic relevance feedback processing. The first problem is how terms should be 
connected with Boolean logic when constructing a modified query. A number of fixed 
formats are presented and evaluated. The second problem is how to select terms for entry 
into the modified query. The stability of the system Is found to be quite sensitive to its 
method of selecting terms. Only very restrictive term selection methods are ... 

Keywords: Boolean, automatic relevance feedback, interactive retrieval, iterative search 
techniques, on-line, query modification 



14 Abstracts of Articles in the Information Retrieval Area Selected by Gerard Salton 
^ September 1986 ACM SIGIR Forum, volume 21 issue 1-2 
Publisher: ACM Press 

Full text available: ^ pdf d. 10 MB) Additional Information: full citation 
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S pecial issue in parallelism in database systems: Query processing and inverted 
indices in shared: nothing text document information retrieval systems 
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Anthony Tomasic, Hector Garcia-Molina 

July 1993 The VLDB Journal » The International Journal on Very Large Data Bases, 

Volume 2 Issue 3 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ pdf(1.65 MB) Additional Information: full citation , abstract, references , citings 

The performance of distributed text document retrieval systems Is strongly influenced by 
the organization of the inverted text. This article compares the performance impact on 
query processing of various physical organizations for inverted lists. We present a new 
probabilistic model of the database and queries. Simulation experiments determine those 
variables that most strongly influence response time and throughput. This leads to a set 
of design trade-offs over a wide range of hardware configur ... 

Keywords: file organization, full text information retrieval, inverted file, Inverted index, 
performance, query processing, shared-nothing, striping 



16 An algorithm for optimized Boolean evaluation in information management systems 
Gerard D. Finn 

May 1984 Communications of the ACM, volume 27 issue 5 
Publisher: ACM Press 

Full text available: '^ pdf(742.77 KB) Additional Information: full citation , abstract , index terms , review 

In cases where simple data validation techniques are inadequate and optimization policies 
relatively complex (e.g., in health and medical systems), a Boolean optimization algorithm 
can be used to report errors accurately and unambiguously. The algorithm is presented in 
the context of a data-validating software module that uses an LR(l)-parser, The 
algorithm's precision makes it of potential use for the retrieval of records that nearly 
satisfy a query. ... 

Keywords: Boolean optimization algorithm, LR parsing, data dictionary, data retrieval, 
data validation, health Information systems, information management systems 



^ Versioning a full-te xt informa tion retrieval system 
Peter G. Anick, Rex A. Flynn 

June 1992 Proceedings of the 15th annual international ACM SIGIR conference on 

Research and development in information retrieval 
Publisher: ACM Press 

Full text available*^ Ddfd 53 MB) Additional Information: full citation , abstract , references , citings , index 
• lAj terms 

In this paper, we present an approach to the incorporation of object versioning into a 
distributed full-text information retrieval system. We propose an implementation based on 
''partially versioned" index sets, arguing that its space overhead and query-time 
performance make It suitable for full-text IR, with its heavy dependence on inverted 
indexing. We develop algorithms for computing both historical queries and time range 
queries and show how these algorithms can be applied to ... 

'1 8 Op timizin g q ueries over multimedia repositories 
Surajit Chaudhuri, Luis Gravano 

June 1996 ACM SIGMOD Record , Proceedings of the 1996 ACM SIGMOD international 

conference on Management of data SIGMOD '96, volume 25 issue 2 
Publisher: ACM Press 

Additional Information: full citation , abstract , references , citings , index 
terms 

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) 
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are becoming increasingly common. A selection on these attributes will typically produce 
not just a set of objects, as in the traditional relational query model {filtering), but also a 
grade of match associated with each object, indicating how well the object matches the 
selection condition {ranking). Also, multimedia repositories may allow access to the 
attributes of each object only ... 

Information stora g e and retrieval: a survey and functional description 
^ Jack Minker 

^ September 1977 ACM SIGIR Forum, volume 12 issue 2 
Publisher: ACM Press 

Full text available: ^ pdf(5.14 MB) Additional Information: full citation , abstract , references 

Information Storage and Retrieval (IS&R) encompasses a broad scope of topics ranging 
from basic techniques for accessing data to sophisticated approaches for the analysis of 
natural language text and the deduction of information. Within the field, three general 
areas of Investigation can be distinguished not only by their subject matter but also by 
the types of individuals presently interested in them:(l) Document retrieval, (2) 
Generalized data management, and(3) Question-answering.A functional ... 

Keywords: automatic indexing, data management, data structures, deductive search, 
information retrieval, natural language, problem solving, question-answering, relational 
data systems, theorem proving 




20 Retrieval effectiveness of an ontology-based model for information selection 
Latifur Khan, Dennis McLeod, Eduard Hovy 

January 2004 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 13 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: pdf(27874 KB ) Additional Information: full citation , abstra ct. Index terms 

Technology in the field of digital media generates huge amounts of nontextual 
information, audio, video, and Images, along with more familiar textual information. The 
potential for exchange and retrieval of information is vast and daunting. The key problem 
in achieving efficient and user-friendly retrieval is the development of a search 
mechanism to guarantee delivery of minimal irrelevant information (high precision) while 
insuring relevant information is not overlooked (high recall). The tradi ... 

Keywords: Audio, Metadata, Ontology, Precision, Recall, SQL 
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